Social Media Analytics: The Kosmix Story
نویسندگان
چکیده
Kosmix was a Silicon Valley startup founded in 2005 by Anand Rajaraman and Venky Harinarayan. Initially targeting Deep Web search, in early 2010 Kosmix shifted its main focus to social media, and built a large infrastructure to perform social media analytics, for a variety of real-world applications. In 2011 Kosmix was acquired by Walmart and converted into @WalmartLabs, the advanced research and development arm of Walmart. The goals of the acquisition were to provide a core of technical people in the Valley and attract more, to help improve traditional e-commerce for Walmart, and to explore the future of e-commerce. This future looks increasingly social, mobile, and local. Accordingly, @WalmartLabs continues to develop the social media analytics infrastructure pioneered by Kosmix, and uses it to explore a range of social e-commerce applications. In this paper we describe social media analytics, as carried out at Kosmix. While our framework can handle many types of social media data, for concreteness we will focus mostly on tweets. Section 2 describes the analytics architecture, the applications, and the challenges. We describe in particular the Social Genome, a large real-time social knowledge base that lied at the heart of Kosmix and powered most of its applications. Section 3 describes how the Social Genome was built, using Wikipedia, a set of other data sources, and social media data. Section 4 describes how we classify and tag tweets, and extract entities from tweets and link them to a knowledge base. Section 5 describes how we detect and monitor events in the Twittersphere. Section 6 discusses how we process the high-speed Twitter stream using Muppet, a scalable distributed stream processing engine built in house [1]. Section 7 discusses lessons learned and related work, and Section 8 concludes. Parts of the work described here have been open sourced [1] and described in detail in recent papers [18, 23, 25, 32].
منابع مشابه
Muppet: MapReduce-Style Processing of Fast Data
MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write su...
متن کاملEntity Extraction, Linking, Classification, and Tagging for Social Media: A Wikipedia-Based Approach
Many applications that process social data, such as tweets, must extract entities from tweets (e.g., “Obama” and “Hawaii” in “Obama went to Hawaii”), link them to entities in a knowledge base (e.g., Wikipedia), classify tweets into a set of predefined topics, and assign descriptive tags to tweets. Few solutions exist today to solve these problems for social data, and they are limited in importa...
متن کاملUser-Generated Content in Social Media
This report documents the program and the outcomes of Dagstuhl Seminar 17301 “User-Generated Content in Social Media”. Social media have a profound impact on individuals, businesses, and society. As users post vast amounts of text and multimedia content every minute, the analysis of this user generated content (UGC) can offer insights to individual and societal concerns and could be beneficial ...
متن کاملSocial Media Visual Analytics for Emergency Management: A Systematic Mapping
Social media visual analytics are becoming important in helping emergency managers gain situation awareness and make better decisions. In this paper, we present a systematic mapping to understand how the field is structured, find out what research topics exists in social media visual analytics for emergencies, and understand what the visual analytics application categories in this area are. Thi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Data Eng. Bull.
دوره 36 شماره
صفحات -
تاریخ انتشار 2013